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Abstract 

Purpose: The three most widely used utility measures are the Health Utilities Index Mark 2 and 3 (HUI2 and HUI3), 
the EuroQol-5D (EQ-5D) and the Short-Form-6D (SF-6D). In line with guidelines for economic evaluation from 
agencies such as the National Institute for Health and Clinical Excellence (NICE) and the Canadian Agency for Drugs 
and Technologies in Health (CADTH), these measures are currently being used to evaluate the cost-effectiveness of 
different interventions in MS. However, the challenge of using such measures in people with a specific health 
condition, such as MS, is that they may not capture all of the domains that are impacted upon by the condition. If 
important domains are missing from the generic measures, the value derived will be higher than the real impact 
creating invalid comparisons across interventions and populations. Therefore, the objective of this study is to 
estimate the extent to which generic utility measures capture important domains that are affected by MS. 

Methods: The available study population consisted of men and women who had been registered after 1994 in 
three participating MS clinics in Greater Montreal, Quebec, Canada. Subjects were first interviewed on an 
individualized measure of quality of life (QOL) called the Patient Generated Index (PGI). The domains identified with 
the PGI were then classified and grouped together using the World Health Organization's International Classification 
of Functioning, Disability and Health (ICF), and mapped onto the HUI2, HUB, EQ-5D and SF-6D. 

Results: A total of 185 persons with MS were interviewed on the PGI. The sample was relatively young (mean age 
43) and predominantly female. Both men and women had mild disability with a median Expanded Disability Status 
Scale (EDSS) score of 2. The top 10 domains that patients identified to be the most affected by their MS were, work 
(62%), fatigue (48%), sports (39%), social life (28%), relationships (23%), walking/mobility (22%), cognition (21%), 
balance (14%), housework (12%) and mood (1 1%). The SF-6D included the most number of domains (6 domains) 
important to people with MS, followed by the EQ-5D (4 domains) and the HUI2 (4 domains) and then the HUB 
(3 domains). The mean and standard deviation (SD) for the PGI, EQ-5D and the SF-6D were 0.50 (SD 0.25), 0.69 
(0.18) and 0.69 (0.13), respectively. The magnitude of difference between the PGI and the generic utility measures 
was large and statistically significant. 

Conclusion: Although the generic utility measures included certain items that were important to people with MS, 
there were several that were missing. An important consequence of this mismatch was that values of QOL derived 
from the PGI were importantly and significantly lower than those estimated using any of the generic utility 
measures. This could have a substantial impact in evaluating the effect of interventions for people with MS. 
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Introduction 

Multiple sclerosis (MS) is a chronic disease resulting 
from inflammation and demyelination in the central ner- 
vous system (CNS) [1] that is associated with a variety of 
symptoms, such as fatigue, impaired mobility and cogni- 
tive decline [2]. Several new therapies, behavioural [3-9], 
medical [10-14], and surgical [15-19], have been devel- 
oped in the field of MS. As there are both benefits and 
harms from interventions, the importance of considering 
the patients perspective in the evaluation of these new 
therapies is increasingly being emphasized. Patient- 
reported outcomes are used to evaluate the patient s per- 
spective on the impact of the disease and its treatment 
on symptoms, function, and other aspects of quality of 
life (QOL). QOL is defined as an "individuals' perception 
of their position in life in the context of the culture in 
which they live and in relation to their goals, expecta- 
tions, standards and concerns [20]." QOL is a global 
construct that includes domains other than health such 
as job satisfaction, quality of housing, and the neighbor- 
hood in which one lives [21]. Health-related quality of 
life (HRQL), on the other hand, is a construct that is 
narrower and focuses on domains within the purview of 
the health care system, such as normal ranges for 
physiological variables, physical, mental and social well- 
being [22,23]. Health status, a term often confused with 
HRQL, is a description and/or measurement of the 
health of an individual or population at a particular 
point in time against identifiable standards [24]. 

While there are a common set of domains that are 
relevant across a wide variety of health conditions, 
including none, these domains may be affected differen- 
tially because of the positive and negative effects of 
interventions. For example, a treatment may have a posi- 
tive effect on one domain (e.g. mental health) but a 
negative one on another (e.g. physical health) and this 
would be condition and intervention specific. 

The most widely used methodology to create an index 
that weighs gains in one domain against losses in an- 
other is based on utility theory. Utility measures (or 
preference-based measures) provide a single value for 
the construct (health status, HRQL, or QOL) ranging 
from 0 (for death or worst possible health state) to 1 (for 
perfect health or best possible health state) [25-29]. This 
value is used to calculate what is termed a "Quality- 
Adjusted Life Year" (QALY) which captures the effect of 
an intervention on quantity of life (mortality) and "qual- 
ity of life" (which is conceptualized as morbidity) 
[30-33]. The "Q" in QALY is a misnomer given it mea- 
sures only the health aspects of QOL, the other aspects, 
which have been elegantly identified by Flanagan, are 
physical and material well-being, relations with other 
people, social community and civic activities, personal 
development and fulfillment, and recreation [34]. 



The three most widely used utility measures, namely the 
Health Utilities Index Mark 2 and 3 (HUI2 and HUB), the 
EuroQol-5D (EQ-5D) and the Short-Form-6D (SF-6D), 
label the constructs underlying these measures as health 
status and/or HRQL [35-39]. None list QOL as the con- 
struct being measured. Yet, for economic evaluation, the 
QALY is the parameter calculated and compared with cost. 

In line with guidelines for economic evaluation from 
agencies such as the National Institute for Health and 
Clinical Excellence (NICE) and the Canadian Agency for 
Drugs and Technologies in Health (CADTH), these 
measures are currently being used to evaluate the cost- 
effectiveness of different interventions in MS. However, 
the challenge of using such measures in people with a 
specific health condition, such as MS, is that they may 
not capture all of the domains that are impacted upon 
by the health condition. If important domains are miss- 
ing from the generic measures, the value derived will be 
higher than the real impact creating invalid comparisons 
across interventions and populations. 

Personalized measures have been proposed as a 
method for identifying those aspects of a health condi- 
tion that impact on QOL. While they may differ from 
person to person and across health conditions, the value 
derived from them represents QOL. The most com- 
monly used individualized measures of QOL are the 
Patient Generated Index (PGI) and the Schedule for the 
Evaluation of Individual Quality of Life-Direct Weighting 
(SEIQOL-DW). Both measures capture the individuals 
perspective on QOL, by permitting him/her to nominate 
the areas of life that are most important and assign a 
weight to each domain. Personalized measures of QOL 
have been used in several clinical trials to evaluate the 
effectiveness of different interventions on overall QOL 
[40-44], Furthermore, these measures have shown to be 
particularly useful in clinical settings by improving patient- 
physician communication and by helping prioritize treat- 
ment options [45-47]. 

The global aim of the study is to contribute evidence 
for the content validity of generic utility measures with 
respect to capturing the relevant domains for people 
with MS. The specific objective was to estimate the ex- 
tent to which generic utility measures capture important 
domains that are affected by MS. 

Methods 

Subjects 

The data for this study comes from a study of the life- 
impact of people diagnosed with MS during the era of 
magnetic resonance imaging (MRI) and disease modify- 
ing therapies (the New MS) [48]. The available study 
population consisted of both men and women who had 
been registered after 1994 at the three participating MS 
clinics in Greater Montreal, Quebec, Canada. The study 
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was approved by all regional ethics committees. Inclusion 
criteria for the study were diagnosis of MS or Clinically 
Isolated Syndrome (CIS) after 1994. From a pool of 5000 
patients, a centre-stratified random sample of 550 patients 
was drawn, of which 394 were contacted. From those who 
were contacted, the first 192 persons who responded were 
enrolled, 189 completed all questionnaires and 185 came 
for an interview. Respondents and non-respondents were 
compared and no clinically or statistically significant 
differences were found between the two groups on 
socio-demographic characteristics. 

Measurement 

Patient generated index 

The PGI is an individualized measure of HRQL that was 
administered in three stages. In the first stage, patients 
were asked to identify up to five of the most important 
areas of their lives affected by MS. In the second stage, 
patients were asked to rate how badly affected they were 
in each of the selected areas on a scale of 0 to 10, where 
0 was the worst they can imagine and 10 exactly as they 
would like to be. A sixth box was provided to rate all 
other health or non-health related areas. In the third 
stage, they were given twelve spending "points" or 
"tokens" to distribute among the areas identified. The 
tokens that they allocated to each area represented the 
relative importance of potential improvements in the 
chosen area. The more tokens a patient spent for an 
area, the more important that area was. The less tokens 
a patient spent, the less important that area was. The 
rating for each area was multiplied by the proportion of 
"points" for that area, which were then summed together 
to produce an index from 0 to 100 [49]. For ease of 
comparison with the utility measures, PGI scores in this 
study were presented on a scale from 0 to 1. 

EQ-5D 

The EQ-5D is a generic preference-based measure of 
HRQL that consists of two parts [50,51]. The first part 
includes 5 separate domains; mobility, capacity for self- 
care, conduct of usual activities, pain/discomfort and 
anxiety/depression. Each domain has 3 levels: no prob- 
lems, some problems, extreme problems. The second 
part consists of a Visual Analogue Scale (EQVAS) to 
measure self-perceived health on a vertical scale from 0 
to 100, where 0 is the worst imaginable health state, and 
10 is the best imaginable health state. The EQ-5D de- 
fines 243 health states, and has a range from -0.6 to 1.0. 

SF-6D 

The SF-6D is a generic preference-based measure de- 
rived from the SF-36 Health Survey (or RAND-36) 
[23,39]. The SF-6D has 6 domains: physical functioning, 
role limitation, social functioning, pain, mental health 



and vitality. Each domain has between 4 and 6 levels. 
The index defines 18 000 health states, and has a range 
from 0.3 to 1.0. 

Procedure 

Figure 1 presents a flowchart of the study procedure. 

Subjects were first interviewed on an individualized 
measure of QOL, the PGI [49]. The domains identified 
with the PGI were then classified and grouped together 
using the World Health Organizations International 
Classification of Functioning, Disability and Health (ICF) 
[52] independently by four raters. This methodology 
followed closely that conducted by Mayo et al [53], 
which evaluated the extent to which HRQL measures 
captured constructs beyond symptoms and function. 
The ICF provided a coding framework and standardized 
description of health related problems at the level of 
body structure/function (e.g. fatigue, cognition), activity 
(e.g. dressing, feeding, walking) and participation (e.g. 
school, work). These levels are also known as impair- 
ments, activity limitations and participation restrictions, 
respectively. Any discrepancies between raters were 
resolved by discussion. 

Last, the domains were mapped onto the HUI2, HUB, 
EQ-5D and SF-6D which had been previously mapped 
to the ICF [53]. The extent to which these utility mea- 
sures captured domains important to patients with MS 
was qualitatively appraised. 

Data analysis 

We had data on hand for the PGI, the EQ-5D and the 
SF-6D (derived from the RAND-36). As all three 













1. Subjects interviewed on the Patient Generated Index and 
domains importantto MS identified 








2. Domains classified using the International Classification 
of Functioning, Disability and Health (ICF) 








3. Domains mapped onto the HUI2, HUI3, EQ-5D and SF-6D 




Figure 1 Flowchart of the study procedure. 
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measures were administered on the same individual, 
generalized estimating equations (GEE) was used to ad- 
just the variance for the clusters of outcome within per- 
sons. The advantage of using GEE, as opposed to the 
paired £-test, was that it allowed for simultaneous assess- 
ment and correlation among all 3 measures. The regres- 
sion coefficients produced in the model were estimates 
of the difference between measures (with 95% CI) 
adjusted for the correlation among data points. An effect 
size (ES) was then calculated using the t-statistic, which 
was equal to the adjusted regression coefficient divided 
by its SE. 

Results 

A total of 185 persons with MS were interviewed on the 
PGI. The sample was relatively young (mean age 43) and 
predominantly female. Both men and women had mild 
disability with a median Expanded Disability Status Scale 
(EDSS) score of 2. The average number of years since 
diagnosis was 6 years, and 59% of the sample was on 
Disease Modifying Therapies. Demographic and clinical 
characteristics are presented in Table 1. 

Table 2 presents the top 10 domains that patients 
identified to be the most affected by their MS. These 
areas were, work (62%), fatigue (48%), sports (39%), so- 
cial life (28%), relationships (23%), walking/mobility 
(22%), cognition (21%), balance (14%), housework (12%) 
and mood (11%). The mean impact score for each 
domain (from 0 to 10) ranged from 3.9 to 5.0. In terms 
of the mean number of points spent for each domain, 
patients spent the most points (4.3) to improve their 
relationships, followed by fatigue (3.8) and then walking 
(mean 3.6). 



Table 1 Demographic and clinical characteristics of 
sample (n = 185) 

Characteristics 



Mean (SD) or N (%) 



Age (y) 

Women/Men 

Definite MS/CIS 

Year since diagnosis 

EDSS, median (IQR) 

On DMT/Not on DMT/No information 

Patient Generated Index* 

EQ-5D** 

SF-6D*** 



42.8 (10.0) 
137/48 (74/26) 
170/15 (92/8) 

6.2 (3.6) 
2.0(1.0-3.5) 
110/19/56 (59/10/30) 
0.50 (0.25) 
0.69 (0.18) 
0.69 (0.13) 



SD, standard deviation; N, number; CIS, Clinically Isolated Syndrome; EDSS, 
Expanded Disability Status Scale; IQR, Inter-quartile range; DMT, Disease 
Modifying Therapies. 

transformed to a scale from 0 to 1, higher scores are better (1 = perfect QOL). 
^Measured on a scale from -0.4 to 1, higher scores are better 
(1 = perfect health). 

***Measured on a scale from 0.3 to 1, higher scores are better 
(1 = perfect health). 



Table 3 presents the results for the mapping of the 10 
domains identified by MS patients against the HUI2, 
HUB, EQ-5D and the SF-6D. School/work was found in 
the EQ-5D and SF-6D but not in the HUI2 or HUB. 
Fatigue was found in the SF-6D but not in the EQ-5D or 
the HUI measures. Sports which was the third most fre- 
quently reported domain, was only found in the SF-6D 
and HUI2. Social life was included in the EQ-5D and the 
SF-6D, but not in the HUI measures. Cognition was 
found in the HUI measures, but not in the EQ-5D or 
the SF-6D. Housework was included in the EQ-5D and 
the SF-6D, but not in the HUI2 or HUB. Relationships 
and balance were not included in any of the utility mea- 
sures. Mood was the only domain that was included in 
all of the measures. 

The SF-6D included the most number of domains 
(6 domains) important to people with MS, followed 
by the EQ-5D (4 domains) and the HUI2 (4 domains), 
and then the HUB (3 domains). 

The generic utility measures included domains that 
were not identified to be important by the sample, such 
as pain, self-care, vision, hearing, manual dexterity, 
speech and fertility. 

The correlation between the SF-6D and the EQ-5D 
was 0.58. As demonstrated in Figure 2a, although the re- 
lationship between the measures was somewhat linear, 
discrepancies in scores between the two measures was 
evident. At the upper end of the scales, a number of 
individuals who had utility scores of 0.85 on the EQ-5D 
had scores as low as 0.6 on the SF-6D. A clinically 
meaningful difference on utility measures is 0.03, indi- 
cating that the difference in scores between the two util- 
ity measures was important. Discrepancies were also 
observed at the lower end of the scale, where an individ- 
ual with a score of 0.12 on the EQ-5D had a score of 
0.55 on the SF-6D. 

The correlation between the PGI and the EQ-5D was 
0.53. As presented in Figure 2b there were important 
discrepancies in scores between the two measures. 
Several individuals with very low scores on the PGI (as 
low as 0.1) had very high scores on the EQ-5D (as high 
as 0.8). For many individuals, there was also a mismatch 
between scores obtained using the PGI and those 
obtained with the EQ-5D (i.e. individuals with scores as 
low as 0.1 on the PGI had scores of 0.8 on the EQ-5D). 
Pearsons correlation between the PGI and the SF-6D 
was 0.53. Similar to what was observed for the EQ-5D; 
there were discrepancies in scores between the 2 mea- 
sures, particularly towards the lower end of the scales 
(Figure 2c). 

The impact of a mismatch between domains provided 
in the generic utility measures and those that are im- 
portant to people with MS is illustrated by the total 
scores of the measures. As seen in Figure 3, the mean 
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Table 2 Top 10 domains identified by subjects using the 
Patient Generated Index 



Domain 


Proportion of 
subjects reporting 
problem 

N (%) 


Degree to which 
subjects are 
affected 

Mean (SD)* 


Number of 
tokens spent 

Mean (SD)** 


School/Work 


114(62) 


4.2 (3.4) 


1 .7 (2.0) 


Fatigue 


88 (48) 


4.5 (2.2) 


3.8 (2.7) 


Sports 


73 (39) 


4.1 (2.6) 


2.9 (2.4) 


Social life 


52 (28) 


4.7 (2.4) 


1 .8 (2.6) 


Relationships 


43 (23) 


4.8 (3.4) 


4.3 (2.6) 


Walking 


41 (22) 


3.9 (2.5) 


3.6 (2.5) 


Cognition 


39 (21) 


4.7 (2.1) 


2.8 (2.2) 


Balance 


25(14) 


5.0 (2.3) 


2.5 (3.3) 


Housework 


23 (12) 


4.8 (2.1) 


1.3 (1.0) 


Mood 


21 (11) 


4.6 (2.4) 


3.4 (2.6) 



*Scored out of 10, higher is better (not affected). 

**Scored out of 12, higher indicates that the domain was more important. 



and standard deviation (SD) for the PGI, EQ-5D and the 
SF-6D were 0.50 (SD 0.25), 0.69 (SD 0.18) and 0.69 (SD 
0.13), respectively. The magnitude of difference between 
the PGI and the 2 utility measures was 0.19 (95% CI 
0.16 to 0.22) with ES equal to 12. 

This mismatch was also present at the item level. A 
total of 41 subjects (22% of the sample) reported walking 
to be an important aspect of their QOL. The distribution 
of scores on the degree to which walking was affected 
for these subjects is presented in Figure 4. The impact 
was measured on a scale from 0 to 10 on the PGI, where 
0 was the worst they could imagine and 10 was exactly 
as they would like to be. These scores were compared 
with the responses on the EQ-5D mobility item. 12 sub- 
jects out of 41 reported having no problems with walk- 
ing on the EQ-5D. These people were expected to have 
a score of 10 on the PGI. Only 1 person reported a score 
of 10 on the PGI. All other subjects reported scores 
lower than this, scores as low as 3 (poor). 

Discussion 

In this study, subjects with MS were interviewed on an 
individualized measure to evaluate the impact of the dis- 
ease on their QOL. The results of the interview gener- 
ated a list of domains that were most important to the 
QOL of persons with MS. The domains identified were 
work, fatigue, sports, social life, relationships, walking, 
cognition, balance, housework and mood. These were 
then mapped onto generic utility measures to estimate 
the extent to which they captured domains that were 
important to persons with MS. 

There was no one generic utility measure that cap- 
tured all of the domains important to persons with MS. 



Table 3 The domains identified by MS subjects compared 
with items in generic utility measures 

Measure HUI2 HUB EQ-5D SF-6D 

Construct Health status Health status HRQL Health 





& HRQL 

[35,36] 


& HRQL 

[3637] 


[38] 


status 
[39] 


MS Domains 










School/Work 


N 


N 


Y 


Y 


Fatigue 


N 


N 


N 


Y 


Sports 


Y 


N 


N 


Y 


Social life 


M 
IN 


M 
IN 


M 
IN 


Y 
T 


Relationships 


M 
IN 


M 
IN 


M 
IN 


M 
IN 


L.uy 1 II UOl 1 


v 
i 


Y 


M 
IN 


M 
IN 


vvaiKii iy 


v 
T 


Y 

y 


Y 
T 


M 
IN 


1— Ini KPwnrk 


N 


N 


Y 


Y 


Balance 


N 


N 


N 


N 


Mood* 


Y 


Y 


Y 


Y 


Total Yes 
(out of 1 0) 


4 


3 


4 


6 


Not MS 
Domains 










Pain 


Y 


Y 


Y 


Y 


Self-care 


Y 


N 


Y 


Y 


Vision 


Y 


Y 


N 


N 


Hearing 


Y 


Y 


N 


N 


Manual 
dexterity 


N 


Y 


N 


N 


Speech 


Y 


Y 


N 


N 


Fertility 


Y 


N 


N 


N 



MS Domains ordered from the largest to the smallest proportion of people 
with MS who identified that domain. 

Y, Yes; N, No; HUI2, Health Utilities Index Mark 2; HUB, Health Utilities Index 
Mark 3; SF-6D, EQ-5D, EuroQol-5D; Short-Form 6D. 
*ln the HUB this was happiness. 



For example, fatigue, which affects 75 to 90% of patients 
with MS [54-57] was not included in the EQ-5D or the 
HUI measures. Walking, another commonly reported 
symptom was not found in the SF-6D. Cognition was 
not found in the EQ-5D or the SF-6D. Work, sports, 
and social life were not found in the HUI2 or HUB. 
This was not surprising as the HUI measures were de- 
veloped with the intention of evaluating within- the- skin' 
experiences that excluded social interaction [58-60]. Bal- 
ance and relationships were not included in any of the 
utility measures. 

The generic utility measures were clearly missing do- 
mains that were important to people with MS. Out of 
the 10 domains that persons with MS identified as being 
central to their QOL, only 3 of them were included in 
the HUI2, 4 were included in the HUI3, 4 were included 
in the EQ-5D and 6 were included in the SF-6D. Fur- 
thermore, the generic utility measures included several 
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Scatter Plot of the EQ-5D and the SF-6D 

Correlation Coefficient = 0.584 



a 



Scatter Plot of the Patient Generated Index and the EQ-5D 

Correlation Coefficient = 0.531 



oooo oo o 0° o oooo 



Patient Generated Index 



Scatter Plot of the Patient Generated Index and the SF-6D 

Correlation Coefficient ■ 0.529 



Patient Generated Index 



Figure 2 Relationship between the EQ-5D, the SF-6D and the Patient Generated Index, a: Scatter plot of the relationship between the 
EQ-5D and the SF-6D. b: Scatter plot of the relationship between the Patient Generated Index and the EQ-5D. c: Scatter plot of the relationship 
between the Patient Generated Index and the SF-6D. 
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PGI 

0.50 (0.25) 



EQ-5D 

0.69 (0.18) 




Mean Diff = 0.00 (95% CI -0.02 to 0.02) 



SF-6D 

0.69 (0.13) 



Figure 3 Mean and standard deviation values for the PGI, 
EQ-5D and SF-6D, with differences and 95% CI calculated using 
generalized estimating equations. 



domains that were not important to persons which were 
sampled in the study, such as pain, self-care, hearing and 
manual dexterity. 

To tackle the issue of lack of content validity, one 
emerging area of interest in the literature is the develop- 
ment of disease specific "bolt-ons" or dimension exten- 
sions to generic utility measures [51]. Another emerging 
area of interest is the development of disease-specific 
utility measures, which have been developed for stroke 
[61], pulmonary hypertension [62], asthma [63], rhinitis 
[64], urinary incontinence [65] and erectile dysfunction 
[66]. Recently, Versteegh et al. [67] derived a MS specific 



utility measure from the Multiple Sclerosis Impact 
Scale-29 (MSIS-29) using Rasch analysis. The authors 
selected 8 out of 29 items from the original question- 
naire. Some important dimensions such as social life, 
work and mood were included while others such as 
walking, sports and physical fatigue were omitted. 

There are several potential benefits to using disease spe- 
cific utility measures in clinical and cost-effectiveness re- 
search. First, disease specific utility measures are designed 
to include domains that are specific to a disease, and 
therefore, are likely to be more sensitive to smaller change 
over time than generic measures. Second, not only do 
these measures provide descriptive information on the 
various dimensions of health, but also provide a value for 
each one, thus allowing trade-offs to be made between the 
domains. Disease-specific utility measures serve the po- 
tential to overcome one of the challenges associated with 
disease specific health profiles - that domains cannot be 
combined into a single index, which makes it difficult to 
conclude whether an intervention was effective or not. 
For example, if a treatment has a positive effect on phys- 
ical health but a negative one on mental health, unless we 
know the relative importance attached to each domain, it 
is impossible to determine whether the intervention 
resulted in a net improvement or decline in QOL/HRQL. 
Furthermore, disease-specific utility measures can be used 
to calculate QALYs and make decisions on the cost- 
effectiveness of different treatments in MS. 

A clinician reported outcome (ClinRO) is an assess- 
ment of the status of a patient s health condition that is 



People who reported having no 
problems on the mobility item of the 
EQ-5D reported their walking to be 
as low as 3 (poor) on the PGI. 




PGI: Degree to which subjects were affected on walking from 0 to 10 

Figure 4 Frequency and distribution of PGI scores on the degree to which walking was affected from 0 (worst they can imagine) to 10 
(exactly as they would like to be). 
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made by an observer with professional training (i.e. clin- 
ician) [69]. ClinRO are commonly used for endpoints 
that cannot be directly measured by the patient (e.g. 
EDSS to quantify level of disability in MS). An observer- 
reported outcome (ObsRO) is an assessment that is 
made by an observer without professional training (i.e. 
non-clinician observer such as a teacher or caregiver) 
[69]. This type of evaluation is typically used when the 
patient is unable to self-report. A patient reported out- 
come (PRO) is any report of the status of a patients 
health condition that comes directly from the patient, 
without interpretation of the patients response by a clin- 
ician or other observer (e.g. symptoms, QOL, HRQL) 
[68,69]. PROs play a complementary role in outcome as- 
sessment by providing evidence on the benefit or harm 
of a treatment from the patients perspective. Utility 
measures are one type of PRO. In outcome assessment, 
utility measures not only provide information on the 
benefits and harms of a treatment, but are also useful 
for economic applications by producing QALYS. This in- 
formation can provide policy and decision makers with a 
means of evaluating the costs and cost-effectiveness of 
different treatment options for a health condition. 

The first step in evaluating the validity of scores 
produced by a PRO is an assessment of content validity, 
before any other forms of validity (i.e. construct validity) 
are undertaken. Content validity of a PRO can be judged 
only by the individuals or populations being assessed (i. 
e. the patients themselves). The global aim of this study 
was to address this very question of whether generic 
utility measures captured domains that were important 
or relevant to people with MS. The results of this study 
suggest that many important domains in MS are not cap- 
tured by generic utility measures, therefore questioning 
the content validity of such measures in MS. This in turn, 
adds doubt to the interpretability or meaningfulness of 
scores produced by these measures for this population. 

It is important to target measures to people to ensure 
that the impact of a disease and its treatment are ad- 
equately and reliably captured in a clinical trial [70,71]. 
If a PRO includes domains that are not impacted upon 
by the disease or its treatment, it will not be able to cap- 
ture clinically meaningful change. By targeting to the 
disease, measures are more likely to be sensitive to small 
but important clinical changes. Furthermore, the ability 
of PROs to detect small changes is important in deter- 
mining the statistical power or the necessary sample size 
required for a clinical trial [72] . 

The results of our study revealed that the commonly 
used 4 generic utility measures (HUI2, HUI3, EQ-5D 
and SF-6D) do not capture the majority of domains im- 
portant to MS. Among these generic measures, the SF- 
6D captured the most number of domains (6 domains) 
that were important to MS. Our findings suggest that 



the SF-6D, compared to the other generic utility mea- 
sures, may be the most appropriate one to use in MS. 
The PGI index can be used to evaluate the clinical ef- 
fectiveness of different interventions in MS. However, 
because the PGI was not developed using multi- attribute 
utility theory (hence is not a utility measure); it cannot 
be used for cost-utility analysis. 

Ideas for future directions that build directly from this 
work are the use of MS specific "bolt-on" items or di- 
mensions to generic utility measures [73] . This study has 
identified potential items important to MS, such as fa- 
tigue that can be used as add-ons to existing generic 
utility measures. Other areas of potential research that 
can build directly from this work are the development of 
an MS specific utility measure that will only include di- 
mensions pertinent to the disease. 

A particular feature of this study is that we purposely 
sampled people with MS diagnosed in the era of 
Magnetic Resonance Imaging (MRI) technology and 
availability of disease modifying drugs [48]. As these are 
the people who are faced with treatment decisions, a 
method of valuing changes on the most important do- 
mains of QOL affected by MS would be the most rele- 
vant for this population. 

Conclusions 

Generic utility measures are designed to include a com- 
mon set of dimensions that most people will value 
highly, therefore underrepresenting those dimensions 
that may be specific to a particular disease. Although the 
generic utility measures included certain items that were 
important to people with MS, there were several that 
were missing. An important consequence of this mis- 
match was that values of QOL derived from the PGI 
were importantly and significantly lower than those esti- 
mated using any of the generic utility measures. This 
could have a substantial impact for evaluating the effect 
of interventions in people with MS. The overestimation 
in scores obtained with utility measures may not have an 
impact at the start of a clinical trial, but they will have 
an impact at follow-up. If scores are high at baseline, 
there will likely be no room for improvement on the 
scale, resulting in the false conclusion that the treatment 
group did not change post-treatment. When in reality, 
the treatment may have had a positive effect but the 
measure being administered was not able to detect this. 
Then the difference between the treatment and control 
group (assuming the control also does not change), 
would be zero. In addition, an intervention that is in fact 
beneficial to fatigue, for example, would also risk not to 
show change on a generic measure because this item 
was not included. When choosing the right outcome 
measure for an intervention, it is essential to choose one 
with items that can or should be affected by the 
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intervention. Given that the MS specific items do impact 
on QOL, not including these items would result in a 
false estimate of QALYs and bias the evaluation of the 
cost-effectiveness of interventions in MS. 
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